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ABSTRACT 


...j  /  /)t  ^  ;  «' 

consider^  local  measurement  error  theory  for  logistic  regression 
which  is  applied  to  four  different  methods:  ordinary  logistic  regression 
without  accounting  for  measurement  error,  a  functional  maximum  likelihood 
estimate,  an  estimate  based  on  linearizing  the  logistic  function  and  an 
estimator  conditioned  on  certain  appropriate  sufficient  statistics.  'Our- 

IJ  A  S<i~- 

asymptotic  theory  includes  a  bias-variance  trade  off,  which  we  use  to 
construct  new  estimators  with  better  asymptotic  and  small  sample  properties 
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1.  INTRODUCTION 


Logistic  regression  is  the  most  used  form  of  binary  regression  see 
Berkson  (1951),  Cox  (1970),  Efron  (1975),  and  Pregibon  (1981).  We  observe 
independent  observations  (Y1,x1),...,(YN,xN)...,  where  (x.)  are  fixed 
p-vector  predictors  and  the  (Y^)  are  Bernoulli  variates  satisfying 

(1.1)  Pr{Y.=l|x.}  =  F(xJb)={1  +  exp(-XjB))-1  . 

Under  regularity  conditions,  the  maximum  likelihood  estimate  for  B  satisfies 

N*(6l-B)  =>  N(0,S-1)  ,  where 

(1.2)  lim  N'1  l  x.xTf(xTb) (I-F(xTb) )  =  S. 

N+~  i=l  11  1  1 

The  motivation  for  our  paper  is  the  Framingham  Heart  Study  (Gordon  and 
Kannel  (1968)),  a  prospective  study  of  the  development  of  cardiovascular 
disease.  This  ongoing  investigation  has  had  an  important  impact  on  the 
epidemiology  of  heart  disease.  Much  of  the  analysis  is  based  on  the  logis¬ 
tic  regression  model  with  (Y^)  being  various  indicators  of  heart  disease  and 
(x.)  being  vectors  of  baseline  risk  factors  such  as  systolic  blood  pressure, 
serum  cholesterol,  smoking,  etc.  It  is  well-known  that  many  of  these  base¬ 
line  predictors  are  measured  with  substantial  error.  For  example,  in  one 
group  of  45-54  year  old  Framingham  males,  we  estimated  that  25%  of  the 
observed  variability  in  systolic  blood  pressure  is  really  ’’measurement"  error 
due  to  reader-machine  variability,  time  of  day,  day  of  week,  etc.  The 
second  author  was  asked  by  some  Framingham  investigators  to  assess  the  impact 
of  such  substantial  measurement  error  and  to  suggest  alternatives  to  usual 
logistic  regression  which  account  for  this  error.  The  present  study  is  an 
outgrowth  of  these  questions. 

There  are  three  major  effects  of  measurement  error  on  ordinary  logistic 
regression.  First  is  bias,  which  becomes  larger  with  larger  measurement 


error,  see  Michalik  and  Tripathi  (1980).  Second  is  attenuation,  i.c.,  a 
tendency  to  underestimate  the  disease  probability  for  high  risk  cases  and 
overestimate  for  low  risk  cases;  the  nature  of  the  attenuation  is  made  more 
explicit  in  Section  2.  Thirdly,  there  is  the  problem  of  hypothesis  testing; 
we  show  the  not  generally  well-known  fact  that  the  usual  tests  for  individual 
regression  components  can  have  level  higher  than  the  nominal  level.  An 
example  where  this  occurs  is  an  unbalanced  two-group  analysis  of  covariance, 
where  one  is  interested  in  testing  for  treatment  effect  but  the  covariable 
is  measured  with  error.  We  believe  we  are  the  first  to  provide  an  explicit 
demonstration  of  this  testing  phenomenon.  Finally,  the  availability  of 
techniques  which  correct  for  measurement  error  can  make  clear  the  need  for 
better  measurement,  e.g.,  more  blood  pressure  readings  over  a  period  of  days. 

Our  measurement  error  model  begins  with  (1.1),  but  rather  than  observing 
the  p-vector  xj  we  observe 

(1.3)  Xi  “  xi  +  ^ei/^  > 

where  ^  is  the  square  root  of  a  symmetric  positive  semi-definite  matrix  $, 
(e^  are  i.i.d.  random  vectors  with  identity  covariance  and  zero  first 
moments.  Here  m  is  known  to  the  experimenter  and  will  be  discussed  shortly. 

In  most  cases,  some  components  of  (xi)  will  be  measured  without  error. 

If  we  view  these  as  the  first  r  components  of  (x^) ,  then  we  have 

»•«)  t-G  t.)  • 

where  is  positive  definite.  This  convention  is  used  throughout  in  an 
effort  to  reduce  notation.  Also,  in  most  cases  the  measurement  error  covari¬ 
ance  $  will  be  unknown,  so  where  necessary  we  will  assume  the  existence  of 
estimators  satisfying 


Cl.  5) 


N*($-j:)  -  op(i)  . 

We  study  the  following  four  estimators. 

Estimator  #1  is  ordinary  logistic  regression  naively  calculated  using 
the  observable  (X^) ,  satisfying 

N  T 

(1.6)  0  =  ^X.(Y.-F(x!Blu))  • 

Estimator  #2  Assuming  that  the  errors  in  (1.3)  are  normally  distributed  and 
$  is  known,  one  can  maximize  the  joint  likelihood  for  (Y^,X^)  and  then  replace 
$  by  This  is  a  type  of  functional  maximum  likelihood  gp,  resulting  in  the 
equations 

(1.7)  0  *  l  x  {Y ,-F(xjB  )} 

i=l  1  1  1  r 

(1.8)  Xi  *  ^  +  m'1tgp{Yi-F(x^&F)}  .  i«l,...,n  . 

Estimator  #3  This  is  g_,  due  to  Clark  (1982)  and  based  on  Bayes-type  esti- 

"  C 

mates  of  (Xj)  given  (X^  or  the  near  linearity  of  F(«)  on  [-3,3]  (Cox  (1970, 
pp  89-90)  combined  with  ideas  of  Fuller  (1980).  Define  (Jx»v>)  as  the  sample 
covariance  and  mean  of  the  (X^).  will  be  partitioned  similarly  to  (1.4), 
and  will  be  the  obvious  generalized  inverse.  Clark's  is  usual  logistic 
regression  based  on 

(1.9)  x.c  *  X.-m'^-^CX.-a). 

Estimator  #4  We  believe  we  are  the  first  to  introduce  what  we  call  the 
sufficiency  estimator  8g.  Given  ($,8),  a  sufficient  statistic  for  xi  assuming 
normal  errors  in  (1.3)  is 


Again  assuming  normal  errors,  the  conditional  probability  that  (Y^=l)  given 
$,  8  and  1^(8,$)  is 

(1.11)  Pr{Yi=l|Ti>  =  F(eTT.(8,$))  . 

This  suggests  solving  for 

N  T 

(1.12)  0=  l  T .(B,f){Y i-F(81T.(8,|))}  . 

i=l  1  1 

It  is  not  difficult  to  show  that  estimators  #1,  2  and  3  are  weakly 
consistent  under  conditions  (2. 2) -(2. 4)  provided  min(m,N)  ■+  «>  .  There  is  a 
minor  problem  with  the  sufficiency  estimator  in  that  equation  (1.12)  may  have 
multiple  solutions  not  all  of  which  lead  to  a  consistent  sequence.  To  guaran¬ 
tee  uniqueness  and  consistency  as  min(m,N)  -*•  »  we  will  take  §s  to  be  the 
solution  to  (1.12)  which  is  closest  to  §LU  . 

Model  (1.14)  is  appropriate  for  two  situations  as  min(m,N)  -*■  •»:  (i)  m 
independent  replicates  of  (x^)  exist,  in  which  case  the  (ei)  become  effectively 
normally  distributed  and  (ii)  a  local  model  in  which  measurement  error  is 
small  but  nonnegligible.  In  the  latter  case  the  moments  of  order  greater  than 
two  of  (e^)  generally  differ  from  that  of  a  normal  variate. 

The  asymptotic  theory  is  dictated  by  a  bias-variance  trade-off.  Fixing 
m  in  (1.3)  and  letting  N  -*■  «,  the  estimators  are  generally  inconsistent,  and 
the  resulting  bias  terms  are  complicated  functionals  of  the  error  law  and 
give  little  insight  into  the  construction  of  good  estimators.  Fixing  N  and 
letting  m  -*•  «  in  (1.3),  all  estimators  reduce  to  the  same  quantity.  Thus  to 
obtain  useful  insight  into  the  behavior  of  the  estimators,  it  seems  reason¬ 
able  to  let  (m,N)  -*•  »  simultaneously. 


In  this  paper  we  will  first  compute  the  asymptotic  distributions  of  the 
2  2 

estimators  as  m  -*■  <*>,  N/m  -*■  X  <  «  .  In  this  set-up,  all  the  estimators  have 
the  same  asymptotic  covariance  matrix  but  have  different  asymptotic  biases. 
These  asymptotic  biases  provide  a  way  to  compare  the  various  estimators,  as 
well  as  to  verify  the  attenuation  and  hypothesis  testing  diffulties  mentioned 
earlier. 

As  a  second  step,  we  will  use  the  asymptotic  biases  found  in  the  first 
step  to  suggest  simple  improvements  of  the  estimators  with  smaller  bias.  We 

j 

then  sketch  an  interesting  theory  for  larger  measurement  errors,  N/in  -*■  X  . 

A  small  Monte-Carlo  study  confirms  that  in  large  samples  our  asymptotics  can 
be  useful  in  better  understanding  the  measurement  error  problem. 


2.  ASYMPTOTIC  DISTRIBUTIONS  FOR  THE  USUAL  METHODS 


In  this  section,  we  first  state  the  main  asymptotic  results  for  the  four 


2  2 

estimators  assuming  N/m  -*■  X  .  At  the  end  of  the  section  we  discuss  the 


statistical  implications  of  the  results  through  examples.  Proofs  are  given 
in  Section  6.  For  the  results  in  this  section  we  require  only  that  ($-$)  = 


op(l). 


Theorem  1:  (Ordinary  Logistic  Regression  &LU)  Define 


N 


(2.1)  S  (Y)  *  N"1  l  F(1)(xTy)x.xT 

i=l  111 


where  F^(»)  is  the  k'-“  derivative  of  F.  Make  the  following  four  assump¬ 
tions: 

N 

1 2 


,  th 


N 


(2.2) 


N"1  I  |k  ||2  =  0(1)  ,  max  ||  x  ||~  =  o(N)  ; 

i=l  1  liiiN 


(2.3)  There  exists  a  positive  definite  matrix  M  such  that  SN(y)  >  M  for  y 
in  a  neighborhood  of  $  and  N  sufficiently  large  and  SN(B)  -*■  S; 


(2.4) 


E(e)  *  0  E(ee  )  =  I  E  ||  e  |]  <»  for  some  6  >  0  ; 


(2.5)  N/m2  X2  0  X  <  »  . 

Then  the  ordinary  logistic  regression  estimate  satisfies 


(2.6)  N*(eLU-|3)  4>  N(-XS"1clu,S"1)  ,  where 

c  =  lim  N_1  £  {F(1)(x?B)$B 
LU  N-*»  i=l  1 

+  BT$BF(2)(xTb)x./2} 

Theorem  2:  (Functional  MLE  Bp) •  Under  the  assumptions  of  Theorem  1 , 

i  N 

(2.7)  NJ(B  -B)  N(-XS‘1c_,S'1),  where  cp=cni-  lim  N"1  £  F(1)  (xTb)^. 

F  F  F  LU  N+-  i-1  1 

Theorem  3:  (Sufficiency  Estimator  0g) .  Under  the  assumptions  of  Theorem  1, 

(2.8)  N^-B)  -L>  N(0,S-1)  . 

Theorem  4:  (Clark's  estimator  Bc) .  In  addition  to  the  assumptions  of  Theorem 
1 ,  assume  ^  +  $  where  ^x  is  the  covariance  matrix  of  the  predictors  (x .) 

Then 

(2.9)  N*(8c-B)  N(-XS  1ccL,S”1)  where 

1  N  x 

ccL=cLU  -  lim  n  If  A(x.-x)F(xfB) 

N-*°°  i=l  1  1 

+xi(xi-7)TATBF(1)(x|B) 

i  N 

x  =  lim  N  1  l  x.,  A  =  (tx+^)  j:  . 

N-**>  1 

Again  (tx+$)  1  is  defined  by  the  convention  in  Section  1. 

DISCUSSION 

Two  comments  are  in  order.  First,  Theorem  1  provides  an  asymptotic  theory 
for  logistic  regression  when  there  is  no  measurement  error  by  simply  taking 
|  -  0.  Second,  in  the  first -pass  asymptotic  theory  developed  here,  the  esti¬ 
mators  differ  only  in  their  limiting  bias.  From  this  perspective,  the  suffi- 


ciency  estimator  is  necessarily  best  because  it  has  no  limiting  bias.  In  the 
next  section  we  produce  a  new  asymptotic  theory  which  casts  the  sufficiency 
estimator  in  a  different  light.  Before  undertaking  this  task  we  comment  on 
the  two  examples  alluded  to  previously. 

Example  *1  Consider  simple  logistic  regression  through  the  origin  with  0  >  0. 
We  expect  to  see  attenuation,  i.e.,  negative  bias  terms  in  (2.6),  (2.7)  and 

(2.9) .  We  will  call  the  opposite,  overestimation  of  0,  overcompensation.  It 
is  easy  to  show  that  -XS_1Cp  is  always  positive,  so  the  functional  mle  over¬ 
compensates,  a  most  surprising  finding.  On  the  other  hand,  for  most  designs 
-XS-1cLlJ  is  negative,  indicating  underestimation  or  attenuation  of  0  for  usual 
logistic  regression.  Somewhat  surprisingly  and  completely  at  variance  with 
the  linear  regression  case,  -XS^c^  can  be  positive  i.e.  usual  logistic 
regression  can  overcompensate.  One  design  in  which  this  occurs  arises  when 
most  cases  have  very  high  or  very  low  risk. 

T  i 

Example  #2  Consider  a  two-group  analysis  of  covariance,  x^  *  (1  ,(-l)  ,d^), 

T  2 

0  =  (Bq.Bj. 02).  We  measure  the  covariable  d^  with  error  variance  a  .  Often, 

interest  lies  in  testing  hypotheses  about  the  treatment  effect  0^.  A  standard 

method  to  test  01=O  is  to  compute  its  logistic  regression  estimate  compared 

to  the  usual  asymptotic  standard  error.  Theorem  1,  through  (2.6)  suggests 

that  this  test  will  actually  approach  its  nominal  level  only  if  the  second 

component  of  S_1cL(J  is  zero.  Denoting  the  second  row  of  S_1  by  s2,  we  see 

that  the  correct  level  is  achieved  only  if 

(2.10)  0  =  lim  N-1  l  sIx.F^  (xT0)o20?  . 

N-**>  i=l  *  1  1 

The  last  will  not  hold  in  the  common  epidemiologic  situation  in  which  the 
true  covariables  are  not  balanced  across  the  two  treatments.  Thus,  when 
substantial  measurement  error  occurs  in  a  nonrandomized  study,  we  can  expect 


bias  in  the  levels  of  the  usual  tests.  Similar  results  hold  for  multiple 
logistic  regression.  Of  course,  in  a  randomized  study  (2.10)  will  be  true, 
so  that  the  ordinary  tests  would  be  appropriate. 

3.  CORRECTED  ESTIMATORS 

In  the  previous  section  we  computed  asymptotic  distributions  when 
2  2 

N/m  -+  X  .  Both  usual  logistic  and  functional  regression  had  asymptotic  bias 
terms.  Since  we  have  explicit  and  fairly  simple  expressions  for  these  bias 
terms,  it  seems  reasonable  to  suppose  that  new  estimators  can  be  constructed 
which  have  no  asymptotic  bias  under  the  set-up  of  Section  2.  We  will  define 
such  estimators  and  consider  their  distributions  under  the  weaker  condition 
N/m4  X2. 

There  are  many  modifications  of  ordinary  and  functional  regression  which 

2  2 

have  no  asymptotic  bias  as  N/m  •*  X  .  For  ordinary  logistic  regression,  we 
have  found  it  simplest  to  merely  subtract  an  estimate  of  the  bias,  obtaining 

C3-1)  8lum  .  {I  .  sN1(eLU)JN?/«)iLU  . 

where  S. . ( • )  is  given  by  (2.1)  with  the  observed  X.  replacing  x.  and 

N  l  1 

JN  =  N-‘  |  ll  F«X«V  ♦  (l)F(2)(X^LU)Xi6ju)  . 

1*1 

For  functional  maximum  likelihood,  we  instead  modify  the  estimators  of  (x^), 
replacing  (1.8)  by 

(3.2)  x.(B)  *  X.  +  (l/m){Y.-F(X^)} 

x  {$e  +  (i)BT?e(i-2F(x^s))x.)  . 

The  result  will  be  denoted  by  We  first  show  that  these  estimators  do 

rM 

correct  for  bias.  The  results  in  this  section  require  the  full  force  of  (1.5) 


2 

X  (0  X  <  »)  and  the  assumptions  of  Theorem  1 


Theorem  5;  Suppose  N/m  -> 

hold.  Then  the  modified  estimators  and  the  sufficiency  estimator 

§  all  have  the  same  limit  distribution  of  Theorem  3. 
s 

Actually,  Theorem  5  is  a  corollary  of  this  more  general  result. 


4  2 

Theorem  6:  Suppose  N/m  -*■  X  (0  <.  X  <  “)  and  that  the  conditions  of  Theorem  1 

✓ 

hold.  In  addition,  assume 

(3.3)  (e^)  have  zero  third  moments  and 


E||  e.||4+6  <  "  for  some  6  >  0. 

Then  the  modified  logistic,  modified  functional  and  sufficiency  estimators 
when  placed  in  the  form  N^(B-B)  are  asymptotically  normally  distributed  with 
covariance  S~*  and  bias  terms  of  the  form  -XS_1c.  For  the  sufficiency  esti¬ 
mator, 

(3.4)  cs  =  (1/24) 

4F(3)(xT(S)ti(Q-3I)tie  , 

■*-ST^(Q-3I)t1BF(4)(xT3)xi  y 


x  lim  N 


L  l 

i«l 


where  Q  satisfies 

E(^e(BTtiE)3}  =  • 

The  other  bias  terms  are  extremely  complex. 


DISCUSSION 

The  important  points  about  Theorem  6  are  two.  First,  we  can  expect  the 
modified  estimators  to  improve  on  their  unmodified  versions;  this  is  confirmed 
to  some  extent  in  the  simulation.  Second,  the  asymptotics  here  show  the 
effect  of  nonnormality  on  the  sufficiency  estimator.  If  the  errors  (e^)  are 
normally  distributed,  then  Q  =  31  and  the  bias  term  cg  =  0.  Thus  in  large 
scale  studies  with  normally  distributed  measurement  error,  we  can  expect  the 


sufficiency  estimator  to  perform  quite  well.  Equation  (3.4)  suggest  that  the 
sufficiency  estimator  will  have  less  optimum  behavior  for  decidedly  non-normal 
measurement  errors. 

4.  MONTE-CARLO 

We  performed  a  limited  Monte-Carlo  study,  designed  to  help  answer  three 
questions.  Are  the  corrected  estimates  of  any  value?  Is  Clark's  estimator 
worth  further  study?  Is  the  asymptotic  theory  any  guide  to  the  performance 
of  the  sufficiency  estimator? 

The  model  for  the  study  was 

(4.1)  Pr{Y^=l|x^}  =  a  +  gx^  ,  i=l,...,N  . 

2 

We  considered  these  sampling  situations  where  Xj  denotes  a  chi-squared  random 
variable  with  one  degree  of  freedom: 

(I) (a,g)  =  (-1.4, 1.4),  (xt)  ~  Normal  (0 ,o^  =  .10)  ,  N  =  300,  600; 

(II)(a,B)  *  (-1.4, 1.4).  (x.)  -  ox(Xi-l)/*^  .  -  .10,  N  *  300,  600; 

2 

For  both  cases,  the  measurement  error  variance  a  was  one  third  the 
2  2  2 

variance  ax  of  the  true  predictors  (o  =  ax/3) .  For  each  case,  we  considered 

two  sampling  distributions  for  the  measurement  errors  (e^) :  (a)  Normal 

2  2 
(0,<x  )  and  (b)  a  contaminated  normal  distribution,  which  is  Normal  (0,o  ) 

2 

with  probability  0.90  and  Normal  (0,25o  )  with  probability  0.10. 

We  believe  these  two  sampling  situations  are  realistic,  but  of  course 
in  such  a  small  study  they  are  not  representative.  To  those  used  to  linear 
regression,  the  sample  sizes  N  =  300,  600  may  appear  large,  but  our  major 
interest  is  in  larger  epidemiologic  studies  where  such  sample  sizes  are  common. 
For  example,  Clark  (1982)  was  motivated  by  a  study  with  N  =  2580,  Kauck  (1983) 
quotes  a  partially  completed  study  with  N  >  340,  and  we  have  analyzed 
Framingham  data  for  males  aged  45-54  with  N  ■  589.  We  would  hesitate  to 
correct  for  measurement  error  in  most  small  sample  situations. 


2 

The  values  of  the  predictor  variance  ox  and  the  measurement  error  vari- 
2 

ance  a  are  similar  to  those  found  in  the  Framingham  cohort  mentioned  in  the 

previous  paragraph  when  the  predictor  was  log  {(systolic  blood  pressure- 75) /3), 

0 

2  2 

a  standard  transformation.  The  ratio  o  /ox  =  1/3  is  fairly  common;  Clark 

also  finds  this  ratio  in  her  study  of  triglyceride.  The  choice  of 

(a, 0)  comes  from  Framingham  data  as  well.  All  experiments  were  repeated  100 

times. 

In  the  experiment,  we  took  m  =  2  by  observing  independent  replicates 
^Xil,Xi2^  eac^  xi’  t*/2  is  in  this  case  a  scalar,  estimated  by  the  sample 
variance  of  (X^-X^)^,  while  j:x*+  fc*/2  is  also  a  scalar,  estimated  by  the 
sample  variance  of  (Xil+Xi2)/2.  We  studied  the  following  simple  computational 
forms  of  the  various  estimators. 

1.  Ordinary  logistic  regression  solving  (1.6); 

2.  Clark's  linearized  estimator  which  does  ordinary  logistic  regression 
based  on  (1.9); 

3.  A  one-step  version  of  the  functional  maximum  likelihood  estimator. 

On  the  right  side  of  (1.8),  replace  x.  by  X.  and  6  by  8,..,  obtaining 
a  new  jL.  Then  solve  (1.7); 

4.  Corrected  ordinary  regression  (3.1); 

5.  A  one-step  version  of  the  corrected  functional  estimator.  On  the 
right  side  of  (3.2),  replace  B  by  8Ly.  Then  solve  (1.7); 

6.  A  version  of  the  sufficiency  estimator  obtained  by  solving  (1.12) 
but  with  Ti(B,$)  replaced  by 

It  can  be  shown  that  the  one-step  estimators  defined  in  (3),  (5),  and  (6) 
differ  from  the  full  estimators  only  in  the  form  of  the  asymptotic  bias  e.g. 
the  one-step  version  of  Bs>  outlined  in  6.  is  also  asymptotically  normal 

4 

provided  N/m  -*■  X;  however  the  bias  term  generally  differs  from  (3.4). 

Sweeping  conclusions  cannot  be  made  from  such  a  small  study.  Basically, 
we  can  make  the  following  qualitative  suggestions.  First,  the  ordinary  logis¬ 
tic  estimator  is  less  variable  but  more  biased  than  the  others;  situations  such 


as  N  *  600  in  the  study  or  Clark's  N  «  2580  are  such  that  bias  dominates  and 
are  hence  candidates  for  using  corrected  estimators,  with  an  opposite  conclu¬ 
sion  for  small  sample  sizes  where  variance  dominates. 

A  second  suggestion  from  the  tables  is  that  when  the  ordinary  logistic 
estimator  loses  efficiency  (Case  1(b),  11(b)  and  when  N  *  600),  the  corrected 
estimators  perform  quite  well.  To  some  extent,  these  numbers  justify  con¬ 
structing  the  asymptotic  theory  of  the  paper,  without  which  the  corrected 
estimators  would  not  have  been  found. 

Clark's  estimator  performs  very  well  in  this  study  when  the  true  predic¬ 
tors  are  normally  distributed  (Case  I),  but  it  does  have  a  drop  in 
efficiency  when  the  predictors  are  highly  skewed  as  in  the  chi-squared  Case  II. 
To  some  extent  this  is  expected  because  the  estimator  is  based  on  an  assump¬ 
tion  of  normally  distributed  predictors.  It  is  surprising  that  the  one-step 
functional  estimator  computed  here  as  well  as  the  sufficiency  estimator  per¬ 
form  so  well  when  the  measurement  errors  are  not  normally  distributed  (Cases 
I  (b),  II  (b)),  as  both  were  defined  through  an  assumption  of  normal  errors. 
Note  too  that,  as  predicted  from  the  theory,  the  corrected  functional  attenu¬ 
ates  the  functional  estimator. 

5.  CONCLUDING  REMARKS 

Our  asymptotic  theory,  which  is  interesting  in  itself,  has  proved  useful 
in  two  ways.  First,  heuristically,  it  provides  a  better  understanding  of 
attenuation  and  it  suggest  a  problem  worth  further  study,  namely  in  what 
situations  can  we  expect  usual  inference  ignoring  measurement  error  to  be  of 
the  wrong  level,  i.e.,  at  what  point  does  increased  bias  overwhelm  decreased 
variance? 

Besides  introducing  the  sufficiency  estimator,  we  have  also  used  the 
asymptotic  theory  to  construct  two  new  estimators  with  reasonable  large  sample 


properties;  all  three  of  these  along  with  Clark's  estimator  performed  well  in 
our  small  Monte-Carlo  study. 

The  pressing  practical  problem  now  appears  to  be  to  delineate  those 
situations  in  which  ordinary  logistic  regression  should  be  corrected  for  its 
bias.  Studies  of  inference  and  more  detailed  comparison  of  alternative  esti¬ 
mators  will  be  enhanced  by  the  identification  of  problems  where  measurement 
error  severely  affects  the  usual  estimation  and  inference. 

Finally,  our  method  of  asymptotics  is  similar  to  that  in  the  interesting 
work  of  Wolter  and  Fuller  (1982)  and  Ameniya  (1982)  for  nonlinear  regression 
models.  The  latter  derives  results  for  nonlinear  regression  similar  in  spirit 
to  our  Theorems  1  and  2  and  even  suggests  corrected  estimates  which  satisfy 
our  Theorem  S.  Because  of  these  similarities,  it  is  useful  to  emphasize  that 
the  problem  and  model  we  have  studied  fundamentally  differ  from  nonlinear 
regression.  The  estimators  we  study  and  the  results  we  have  obtained  are  of 
course  not  covered  in  the  work  of  Wolter  and  Fuller  (1982)  and  Ameniya  (1982). 

6.  PROOFS  OF  PRIMARY  RESULTS 

Because  the  number  of  unknown  parameters  increases  with  increasing  sample 
size  the  classical  results  on  consistency  and  asymptotic  normality  of  maximum 
likelihood  estimates  are  not  immediately  applicable.  As  noted  earlier  condi¬ 
tions  (2. 2) -(2. 4)  and  min(m,N)  -*■  ®  are  sufficient  to  insure  consistency  of  all 
the  estimators  in  section  2  subject  to  the  caveats  regarding  multiple  solutions 
to  (1.12)  (details  are  available  from  the  authors).  We  will  prove  Theorems  1 
and  2  and  sketch  the  major  steps  in  the  proof  of  Theorem  6  for  the  sufficiency 
estimator.  Proofs  of  the  other  results,  being  nearly  identical,  are  omitted. 

We  start  with  a  series  of  lemmas.  In  each  case  we  assume  (2. 2) -(2. 5) 
and  consistency  of  $  .  Note  that  is  defined  by  (1.7),  (1.8). 


Lemma  1;  With  0M  =  N’1  £y. (x.-X.)  we  have  N*DW  =  A$8N"ljF^ (xTe)  «•  o  (1) 

W  1*11  IN  *  1  p 


Proof: 


-  p 

H\  *  ^Fm'1N'1jYi(Yi-F(x|&F)) 

iN  T 

Write  N-^Y.CY.-FCxJgp))  =  A^ 


where 


Ai  *  N  fYi(VF(xi6)) 

A2  =  N-1^Yi (F (xTg) -F(xT§p) ) 


The  difference  between  Aj  and  its  expectation  is  o  (1),  that  is 


(6.1) 


Aj  -  K’^F^CxJw  +  op(l) 


(6.2)  |a2|  iM^flxJe-iJjpl 


^N-^llx.-x.ll  ||  6  ||  *  || x.  ||  HMpll) 


<  N-^dl  x.-x.H  ||  e||  ♦  ||  Mp||  ||  e||  /»  *  || x.  ||  ||  e-SF||  > 
^N-1|{||  th.\\  II  el!  /*l  ♦  II  Mpll  II  ell  /«  -  II x.  II  He-Upll ) 


and  clearly  this  last  term  is  op(l).  Finally  (6.1),  (6.2),  (2.5)  and  consist 


ency  of  f  and  6p  complete  the  proof. 


Lemma  2:  With  =  N_1|(F(X^e)Xi-F(xJe)xi)  we  have  -  op(l). 

T  T 

Proof :  A  Taylor  series  expansion  of  F (5t^6)  about  the  point  X^6  yields 

i  iN  ^  t-  -1-  a  „t _ _ T..  ..-lr 


-15- 


where 


llrjiu  lull  (Hl*i II  HsIDlIXi-iill2 

<  1!  B||  (2*11*4  11  liBil  311  tBF||2»-2  . 

,N 

In  light  of  (2.5)  and  consistency  of  $  and  ()p  N~2£||  rill  ■  op(l) •  The  first 


term  on  the  r.h.s.  of  (6.3)  equals 


(6.4) 


-|eFm-1N-i  I(Y.-F(x{3F))F(xTe)  . 


rp 

With  an  argument  similar  to  the  one  used  in  Lemma  1  we  may  replace  x^Bp  by 
T 

x^B  in  each  of  the  summands  in  (6.4)  altering  (6.4)  only  by  a  term  which  is 
0^(1).  The  resulting  quantity  is 


P8pm“1N’1i(Yi-F(x[«)F(X'J'B) 


The  normed  sum  in  (6.5)  has  zero  mean  and  asymptotically  negligible  variance 
thus  the  first  term  in  (6.3)  is  Op(l) .  In  a  similar  fashion  one  can  show  the 
remaining  term  in  (6.3)  is  0^(1)  finishing  the  proof. 

-1N  T  1 

Lemma  3:  Define  TLU  N  =  N  ][(Y^-F(X? B))X^  then  N*TL^  N  converges  in  law  to 
a  multivariate  normal  random  variable  with  mean  -Xc^  and  covariance  matrix  S. 


1  1N  T 

Proof;  NJTlun  =  N"iI(Y.-F(x;e))(X.-xi) 


♦  N"^(Y.-F(X[B))X. 

T  T 

By  expanding  F(X^B)  in  a  Taylor  series  around  x^B  in  each  of  the  above  sums 

we  find,  after  recombining  terms 

,  ,N  T 

(6.6)  N\u,N  "  N' ‘  ICYi-F(x* B))Xi 


-N’^(Xi-xi)(Xi-x.)TBFC1)(X^B) 


-N"ii(X.-xi)TBF(1)(xTB)xi 


»  -  _ 
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-M"1?((Xi-xi)TB)2F(2)(x[B)xi/2 


where  )L  and  X^are  on  the  line  segment  joining  X^  and  x^.  The  third  term  in 


(6.6)  is  op(l)  by  virtue  of  its  zero  mean  and  vanishing  variance,  since  m  -*■  «». 


The  first  term  may  be  written  as 

1N  T  1 


(6.7) 


N'1I(Y.-F(xfB))x.  +  N'^(Y.-F(x'B))(Xi-x.), 


and  the  second  term  in  (6.7)  also  has  zero  mean  and  asymptotically  negligible 
variance.  Assumptions  (2.2),  (2.3)  and  an  appeal  to  the  Lindeberg  Central 
Limit  Theorem  are  used  to  show  the  first  term  in  (6.7)  is  asymptotically 
normally  distributed  with  zero  mean  and  covariance  matrix  S. 

Write  the  fourth  term  in  (6.6)  as  Bj+B2  where 

,N 


B1  *  xi/2 


J«2B(2),„Ta 


B2  »  -N'iX((Xi-x.)TB)2(F(2)(^B)  -  F(2)(x^B))x./2 


Assumption  (2.2)  and  the  2+6  moments  of  || e.J| imply  Bj-E(Bj)  =  op(l).  As 
for  B_  the  inequality  |F^ (x)-F^ (y) |  s.  min(l,3|x-y|)  can  be  used  to  conclude 


-  II  II-  II2  -i-n  tII  a|| 


||  B21|  <  (II  B||  /2)N"V1p*ill  lle.ir  min(l,3||B||m-s||e.||) 


and  hence  (2.2)  implies 


E C II  B2 1|  )  <.  (const.)  E ( |1  Ej  ||  2 min (1,3 1|  b||  m“s||  ejl  )) 


.-i| 


The  Dominated  Convergence  Theorem  together  with  the  Markov  Inequality  are 
used  to  show  this  last  quantity  converges  to  zero  as  m  •+  °°  and  thus 


Bl+B2  “  ,;(B1)  +  °p{1) 


.N 


-X  M"1jBTtBFC2)(xjB)xi  +  op(l), 


Similarly  the  second  term  in  (6,6)  can  be  shown  equal  to 
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-xN_1^aF(1)(x^e)  ♦  o  (i). 

1  p 

Combining  the  preceding  facts  and  noting  the  definition  of  cLU  establishes 
the  desired  result. 


Proof  of  Theorems  1  $  2:  In  each  summand  appearing  in  (1.6)  and  (1.7)  apply 
the  mean  value  Theorem  to  F(»)  to  arrive  at 


(6.8) 

(6.9) 

where 


slu(6lu  ‘  =  tlu,n 

Sp(SF  -  M  ■  TLU<N  +  Dn  +  Rjj 

-  N_1  p“)(XI8LU,i)XiXI 

SF  -  N-1  jr(1)(i]«Fil)i1i]  . 

For  each  i  SLU  i  and  Bp  i  lie  on  the  line  segments  joining  and  8p  to  B 
respectively.  In  light  of  the  previous  results  we  need  only  show  and  Sp 
converge  to  S  in  probability.  We  prove  this  for  §LU  only,  a  similar  demon¬ 
stration  works  for  S_  as  well. 

F 

Since  X^-x.^  *  (Ji/m)^  and  by  assumption  m  -*•  it  is  not  difficult  to 


show 


5lu  -  N_1  p(1)(xi5tu,i)xixi  •  °p(1)' 


Thus  omitting  terms  of  order  Op(l) 


(6.10) 


V»  -  §UI  ■  V6’  -  ' 


The  norm  of  the  right  hand  side  of  (6.10)  is  bounded  by 

(6.11)  N1  f|  B  -$||  (  sup  N_i||  x  ||  )N_1  |||  x.||2  . 

LU  l<i<N  1 

Lemma  3  and  (2.3)  imply  N^(S^-B)  =  Op(l)  and  hence  (2.2)  implies  that  (6.1J) 
is  op(l).  By  assumption  SN(8)  -►  S  which  in  turn  implies  S^y  5  s  completing 
the  proof. 


We  now  outline  the  proof  of  Theorem  6  for  the  sufficiency  estimator. 


Proof  of  Theorem  6  (Sufficiency  Estimator) :  After  a  preliminary  expansion  we 
arrive  at 

i  ,N  T 

sN*(8s-e)  =  M"iIri(3t{)[Yi-p(e1Ti(B,t))]  ♦  opci) 

(6.12)  1 

■  I  ♦  II  ♦  III  ♦  IV  +  V  ♦  0p(l) 

where 

1N  T 

(6.13)  I  =  N'^[Y.-F(x!e)]X 

1  1 

II  =-0T^0m"1N"Jf(Y.-J)F(1)(xj6)X 
1  i  ii 

III  =«  -(BT$B)  W*£f(2)(xTb)X./8 

1  1  1 

IV  »  f&n'V^Y  -i)[Y  -F(X^)] 

1  l  ii 

V  =  -6Tf8fem'2N"^F(1)(xT0)/4 
1  X 

In  arriving  at  (6.13)  we  have  used  the  fact  that  N^(|-lj  =  0p(l).  By  using  an 
argument  similar  to  one  employed  in  Leona  3  we  may  write 

1N  T 

(6.14)  I  =  N"^(Y  -F(x!b))x 

1  1 

N  3 

-N_iX  l  (X.-x.)((Xi-x.)TB)3F(j)(xTB)/j| 

1  j»0 

-N'^x.  I  ((X  -x.)TB)jF(j)  (x!f5)/jt  +  o  (l) 

1  j-1 

Because  of  the  bound  on  the  4+6t^1  moment  of  |j  e  ^  |[  replacing  the  last  two 
terms  in  (6.14)  by  their  expectations  alters  I  only  by  a  term  which  is  o(l). 


Thus,  writing  FK  J 


for  F1*j(x!b) 


(6. IS) 


I  =  S 


+  m'1  tiQt^eC6T<:6)FC3:)/3i  > 


-N‘^m“1BT|6^{x.F(2V2  +  m_1BT^Q^Bx.F(4V4! } 

i  1  * 


♦  °pm 


Q  is  the  matrix  appearing  in  (3.4)  and  ^  has  a  limiting  N(0,I)  distribution. 
Similarly, 


II  =  BTfBN"^m"1^{xiF^2V2-m"1j:B  F(2)  (F-J)-m'1  B^Bx-F1^  (F-J)/2}+o  (1) 


O^f  Ov  E^), 


III  =  -(BTt»V^i"2  fx.F(2)/8  +  o  (l) 

1  1  P 


JJ 

IV  =  $6  N"im'1J(F(1)-m‘1(BT^)F(2)(F-i)/2}  +  o  (1) 


V  =  -BTte$Mf*m"2  lF(1)/4  +  o  (1) 
1  p 


Combining  these  terms  and  using  the  identities 
F<»  -  -3F(2i(F-l)  -  F(1)/2 
PW  .  -4F^  (F-l)  -  F<2> 

we  find 


SN1(S$-B)  =  -  Xcs  +  op(l) 


proving  the  theorem. 


REMARKS 

The  modified  estimators  weaken  the  necessary  condition  for  asymptotic 
-2  -4 

normality  from  Nm  =  0(1)  to  Nra  ■  0(1)  at  the  expense  of  stronger  condi 


tions  on  the  eiror  law.  As  might  be  expected  it  is  possible  to  play  this 


game  indefinitely.  With  appropriate  assumptions  on  the  first  2k  moments  of 
the  error  law  one  can  construct  a  modified  version  of  the  naive  estimator 
gIM  which  is  asymptotically  normal  provided  Nm"  =  0(1)  for  any  positive  inte 

LU 

ger  k.  Details  on  this  extension  of  the  theory  are  available  from  the  authors 


TABLES 


These  are  the  results  of  the  Monte-Carlo  study.  "Efficiency" 
refers  to  mean  square  error  efficiency  with  respect  to  ordinary 
logistic  regression. 


ORDINARY  CORRECTED  CORRECTED 

LOGISTIC  LOGISTIC  FUNCTIONAL  FUNCTIONAL  CLARK  SUFFICIENCY 


ORDINARY  CORRECTED  CORRECTED 

LOGISTIC  LOGISTIC  FUNCTIONAL  FUNCTIONAL  CLARK  SUFFICIENCY 


Efficiency  I  NA  201%  190%  193%  159%  194% 
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A  NOTE  FOR  THE  EDITOR  AND  REFEREES 


Proof  of  Consistency:  Of  the  four  estimators  introduced  in  section  1  we 
will  prove  consistency  of  B^y  and  Bp  employing  assumptions  (2. 2) -(2. 4)  and 
the  condition  min(N,m)  -*•  «.  The  proof  for  is  similar  while  our  convention 
regarding  multiple  solutions  to  (1.12)  will  insure  consistency  of  the  suffi¬ 
ciency  estimator. 

T 

To  present  the  proofs  we  need  some  additional  notation.  Write  xi  = 

T  T 

(ui » vi^  where  v^  corresponds  to  the  components  of  x^  measured  with  error. 

aT  t  a'|’  t  T  T 

Analogously  we  write  =  (u^,v/)  for  5L  given  in  (1.8)  and  X.^  =  (u.  ,1^) 

where  =  v^+e^m  an<*  E(eimeim)  =  m  t**  Let  H(»)  =  logF(*)  and  note  that  H(») 

has  Lipschitz  norm  one  and  | H(t) |  <.1+  |t|  for  all  t  £  R1 .  Finally  let 


(A.l)  gN(y) 


(A.  2)  (^(y) 


*  N"1  l  (F(xTb)H(xTY)  ♦  F(-xTb)H(-xTy) } 

i=l  1  1  li 

*  N'1  ^  CY,  H(iTY)  +  (I-Y.)H(-xTy)) 

i=l  1  1  ii 


(A-3)  LN(Y,{v.}p  =  N"1  X  (Yi  H(x^y)  ♦  (1-Y.)H(-x|y> 


-N-1 (m/2)  l  (Vi-v.)T|;1(V.-v.)  . 

i=l  1  1  11 

In  defining  (A.l)  and  (A. 3)  and  elsewhere  in  the  proof  we  use  the  sequence 
(vi>1  both  to  represent  the  true  but  unknown  predictors  (A.l)  and  as  mathe¬ 
matical  variables  in  the  argument  of  the  function  1^  (A. 3).  The  context  should 
make  clear  which  interpretation  is  appropriate.  is  the  normed  functional 
log-likelihood  assuming  normal  errors  and  replacing  by  the  consistent 
estimate  thus  by  definition 

(A.4)  LN(0F,(vi>J)  >.  Ln(y,{v.}j) 


for  all  y  *.  vP  and  (Vj)j  a  nN(p-r)  ,  But  (A.4)  implies  that 


* 


Thus  SF  maximizes  the  random  concave  function  G^*).  Note  that  since  G^*) 
is  defined  in  terms  of  x^  it  depends  explicitly  on  $p  also.  However  this  does 
not  affect  the  validity  of  the  inequality 


(A.  5) 


The  naive  estimator  also  maximizes  a  certain  random  concave  function. 
Specifically  we  have 


(A.  6) 


V6lu-{V?>  *• 


The  fimction  gN(*)  is  concave  for  each  N  and  (2.2)  along  with  the  inequal¬ 
ity  |H(t)|<l+|t|  implies  that  for  each  fixed  y,  (gN(y)}  is  a  bounded  sequence  of 
real  numbers.  Although  our  assumptions  do  not  imply  that  (gN(*)}  converges  it 
is  true  that  every  subsequence  contains  a  further  subsequence  converging  uni¬ 
formly  on  closed  bounded  subsets  of  RP  to  some  finite  concave  function 
(Rockafellar  Thm.  10.9).  Assumption  (2.3)  insures  that  the  limit  of  every 
convergent  subsequence  possesses  a  unique  maximum  at  B.  Suppose  for  the  moment 
that  G^y)-  gN(y)  =  °p(l)  for  each  fixed  y.  Pick  any  subsequence  (Bp  N  } 
from  (Bp  N)  and  let  (gN  (•)}  be  the  corresponding  subsequence  from 

{gN( •) } •  Now  from  (g^  (•)}  we  can  always  choose  a  further  subsequence 

wk 

(gN  (*))  which  converges  to  some  concave  function  g(»)  with  a  unique  maximum 

at  8.  Of  course  this  implies  GM  (y)  -  g(Y)  =  o  (1)  and  since  Bc  M  maxim- 

Nk,j  P  F’\.j 

izes  Gn  (•)  an  appeal  to  Theorem  II. 1  of  Anderson  and  Gill  (1982)  implies 
kj 

M  -  8  =  o  fl).  This  shows  that  every  subsequence  of  (B_  N)  contains  a 
’  K,j  "  * 

further  subsequence  which  converges  in  probability  to  B  which  in  turn  implies 

&P  N-  B  ■  0^(1).  Thus  to  prove  consistency  of  Bp  we  need  only  show  GN(y)-RN(y)  = 

o  (1)  for  fixed  y.  Similarly  consistency  of  3,„  is  established  by  showing 

p  LU 


LjjCy* -  gN(y)  *  Op(l).  To  complete  the  task  we  start  with 

Proposition  1:  Assume  (2.2)  and  suppose  min(N,m)  -*■  «  then  ^(y.W^Jj)  -  8n(y)  = 
Op(l)  for  each  fixed  y  . 

Proof :  The  quantity  under  investigation  may  be  written  as  Tj  +  T2  where 
T.  =  N'1  l  (Y.  H(xTy)  -  F(xTe)  H(xl'y)  }  , 

i  i=l  1  i 

N 

T,  *  N'1  l  {(l-Y.)H(-xTy)  -  Ff-x^Hf-xTy) }  . 

L  i=l  1  1  11 

Furthermore 

T.  =  N"1  l  (Y.[H(xTy)  -  H(xTy)]}  ♦  N'1  f  {(Y  -F(xJb)]H(xTy)} 

1  i=l  i=l  1  1 

=  T„  ♦  T12  say. 

The  Lipschitz  condition  on  H  implies 

|TUI  iN*1  J  |(X.-X.)Ty! 
i=l 

i  IMI N'1  jhl'i. II  • 

The  last  expression  is  Op(l)  provided  min(m,N)  -*•  ®  .  TJ2  has  zero  mean  and 
variance 

N-2  l  F(1)(x]b)H2(xTy)  5.  N'2  f  {(1+li  x.  ||  ||  y||  )2}  , 

i=>l  11  i*l  1 

which  vanishes  in  the  limit  in  view  of  (2.2).  Thus  T^  =  Op(l)  and  by  an 
identical  argument  T2  a  Op(l)  concluding  the  proof. 

In  addition  to  proving  consistency  of  0LU  Proposition  1  yields  the  fol low¬ 


ing  two  useful  corollaries. 


Corollary  la: 


Pr{|LN(3,{Vi>5‘)|  <  1}  -  1 

Proof:  From  (A.l)  and  the  definition  of  H(») 

|g«(B)  I  2  sup  [t  log  tj  <  1 
0<t<l 

and  by  Proposition  1  gN(0)  -  =  opCl)  - 

Corollary  lb: 

Pr{N-1(m/2)  \  ||  V  -v  ||2  i  ||  $J|  }  +  1 
i=l  1  1 

Proof:  By  definition  LN(8p, (v^> i.  L^(B,{V^}j)  or  equivalently 
N 

(A. 7)  N-1  l  {Y.HfiTg  )  ♦  (1-Y  )H(-«T0  )}  > 

i=l  1  r  1  1  r 

LjjCB.fV.jJ)  +  N_1  (m/2)  ?(vi-v.)Ti;1(Vi-vi). 

Since  the  l.h.s.  of  (A. 7)  is  almost  surely  nonpositive  and 
Pr{Ljj(B,{V^}j)  <  -1}  ■*  0  it  must  be  that 

i  N  i 

(A. 8)  Pr{N_1(m/2)  J  (V.-v.jfc^V.-v.)  i.  1}  -  1  . 

i=l 

The  conclusion  follows  from  the  consistency  of  and  an  application  of  the 
2  T  *1 

inequality  ||  1 1|  i  ||  A  ||  t  A  t,  true  for  all  positive  definite  matrices  A. 
We  are  now  in  a  position  to  complete  the  proof  of  consistency  for  (L. 

Proposition  2:  In  addition  to  (2.2)  suppose  min(m,N)  -*■  »  and  =  °p(l)> 

then  GjjCy)  -  gN(y)  B  °p^^  for  eacl1  ^xed  Y  • 

Proof:  In  light  of  Proposition  1  it  suffices  to  show  GN(y)-  = 


Op(l).  Write  this  last  quantity  as  Wj+W2  where 


Wj  =  N’1  1  {Y.(H(^y)  -  H(xJy))} 


W2  =  N'1  l  { (1-Y. ) (H(-Xly)  -  H(-x!y))} 

*  i»l  1  1  1 

The  Lipschitz  condition  on  H(*)  and  Schwarz's  inequality  imply  that  for  j=»l  ,2 

N 


i=l 

N 


~T 


(A.9)  |».|  iN'1  1  |  {X  -i  }Ty| 

1  i=l 

a-IHIn-1!  ||  Xj-JJI 


N'1.!  II  V^U 


N 

l 

i=l 

N 


i=l 


i  II  tII  w"'.I  llVj-iJI2)5 


i=l 

The  r.h.s.  of  (A. 9)  is  0^(1)  by  virtue  of  Corollary  lb  and  this  completes  the 
proof. 
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